253 research outputs found
Discriminative latent variable models for visual recognition
Visual Recognition is a central problem in computer vision, and it has numerous potential applications in many dierent elds, such as robotics, human computer interaction, and entertainment. In this dissertation, we propose two discriminative latent variable models for handling challenging visual recognition problems. In particular, we use latent variables to capture and model various prior knowledge in the training data. In the rst model, we address the problem of recognizing human actions from still images. We jointly consider both poses and actions in a unied framework, and treat human poses as latent variables. The learning of this model follows the framework of latent SVM. Secondly, we propose another latent variable model to address the problem of automated tag learning on YouTube videos. In particular, we address the semantic variations (sub-tags) of the videos which have the same tag. In the model, each video is assumed to be associated with a sub-tag label, and we treat this sub-tag label as latent information. This model is trained using a latent learning framework based on LogitBoost, which jointly considers both the latent sub-tag label and the tag label. Moreover, we propose a novel discriminative latent learning framework, kernel latent SVM, which combines the benet of latent SVM and kernel methods. The framework of kernel latent SVM is general enough to be applied in many applications of visual recognition. It is also able to handle complex latent variables with interdependent structures using composite kernels
Pose Embeddings: A Deep Architecture for Learning to Match Human Poses
We present a method for learning an embedding that places images of humans in
similar poses nearby. This embedding can be used as a direct method of
comparing images based on human pose, avoiding potential challenges of
estimating body joint positions. Pose embedding learning is formulated under a
triplet-based distance criterion. A deep architecture is used to allow learning
of a representation capable of making distinctions between different poses.
Experiments on human pose matching and retrieval from video data demonstrate
the potential of the method
VIDEO THUMBNAIL SELECTION BASED ON DEEP LEARNING
Video thumbnails are often the first thing a viewer sees when browsing or searching for videos. A frame that is visually representative of the video is typically selected and used as a thumbnail representation of the video. Sometimes, such a thumbnail is not an adequate semantic representation of the video. Further, it is possible that such a thumbnail is not visually pleasing. This disclosure describes deep learning techniques to select video thumbnails that are visually attractive and reflect the content of a video. Thumbnails as described in this disclosure are attractive, improve a likelihood of user selection, and help users find relevant content easily
A Dimension-Augmented Physics-Informed Neural Network (DaPINN) with High Level Accuracy and Efficiency
Physics-informed neural networks (PINNs) have been widely applied in
different fields due to their effectiveness in solving partial differential
equations (PDEs). However, the accuracy and efficiency of PINNs need to be
considerably improved for scientific and commercial use. To address this issue,
we systematically propose a novel dimension-augmented physics-informed neural
network (DaPINN), which simultaneously and significantly improves the accuracy
and efficiency of the PINN. In the DaPINN model, we introduce inductive bias in
the neural network to enhance network generalizability by adding a special
regularization term to the loss function. Furthermore, we manipulate the
network input dimension by inserting additional sample features and
incorporating the expanded dimensionality in the loss function. Moreover, we
verify the effectiveness of power series augmentation, Fourier series
augmentation and replica augmentation, in both forward and backward problems.
In most experiments, the error of DaPINN is 12 orders of magnitude lower
than that of PINN. The results show that the DaPINN outperforms the original
PINN in terms of both accuracy and efficiency with a reduced dependence on the
number of sample points. We also discuss the complexity of the DaPINN and its
compatibility with other methods.Comment: 33 pages, 12 figure
Multi-task super resolution method for vector field critical points enhancement
It is a challenging task to handle the vector field visualization at local critical points. Generally, topological based methods firstly divide critical regions into different categories, and then process the different types of critical regions to improve the effect, which pipeline is complex. In the paper, a learning based multi-task super resolution (SR) method is proposed to improve the refinement of vector field, and enhance the visualization effect, especially at the critical region. In detail, the multi-task model consists of two important designs on task branches: one task is to simulate the interpolation of discrete vector fields based on an improved super-resolution network; and the other is a classification task to identify the types of critical vector fields. It is an efficient end-to-end architecture for both training and inferencing stages, which simplifies the pipeline of critical vector field visualization and improves the visualization effect. In experiment, we compare our method with both traditional interpolation and pure SR network on both simulation data and real data, and the reported results indicate our method lower the error and improve PSNR significantly
- …